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We propose 'integrated synthetic genetics' as a novel methodology that integrates reductive and synthetic 
approaches used in life science research. Integrated synthetic genetics enables determinations of sets of 
genes required for the functioning of any biological subsystem. This method utilizes artificial cell-like 
compartments, including a randomly introduced whole gene library, strictly defined components for in 
vitro transcription and translation and a reporter that fluoresces 'only when a particular function of a target 
biological subsystem is active.' The set of genes necessary for the target biological subsystem can be 
identified by isolating fluorescent artificial cells and multiplex next-generation sequencing of genes included 
in these cells. The importance of this methodology is that screening for the set of genes involved in a 
subsystem and reconstructing the entire subsystem can be done simultaneously. This methodology can be 
applied to any biological subsystem of any species and may remarkably accelerate life science research. 



Life science research seeks to elucidate the relationships between genotypes and phenotypes. This typically 
involves reductive (genetic and omics research) and synthetic (synthetic biology) approaches. Genetic and 
omics research seeks to identify genes involved in a biological subsystem of interest such as transcription, 
translation, signal transduction, genome repair and metabolism. This approach enables identifying individual 
genes in a target biological subsystem, although the entirety of the subsystem is not readily characterized. To 
overcome this, synthetic approaches are used, in which a known set of genes is isolated and combined to 
reconstruct a target biological subsystem with the aim of a complete proof for the entirety of the subsystem. 
Life science research combines these two methodologies. However, it is often difficult to fill in information gaps in 
a target subsystem on the basis of these approaches, and tremendous amounts of time and effort are required to 
completely understand a subsystem's functions. 

To integrate reductive and synthetic approaches, we propose 'integrated synthetic genetics', a novel approach 
that integrates the advantages of reductive and synthetic approaches. This method provides for simultaneous 
high-throughput implementation of screening for genes involved in a particular subsystem and reconstructing 
the entire subsystem. The core of this approach is incorporating artificial cell-like compartments, including an in 
vitro transcription and translation system (PURE system) \ The elements of a PURE system are strictly deter- 
mined, and a functional protein can be synthesized by introducing any gene fragment in these artificial compart- 
ments^. Because as many as 10^ artificial cell-like compartments can be constructed and the functions of the genes 
inside these compartments can be evaluated while maintaining genotype-phenotype associations, they have been 
used for the directed evolution of proteins and RNAs^"^. 

Figure 1 shows an outline of our proposed methodology. First, a whole gene library from a target organism is 
prepared and artificial cell-like compartments that contain randomly introduced library components are con- 
structed (Fig. la). Simultaneously, a reporter that fluoresces 'only when a particular function of a target biological 
subsystem is active' is introduced (Fig. lb). Next, a liposome library that contains various combinations of genes is 
constructed (Fig. Ic). Because proteins are synthesized using these introduced gene combinations and express 
their functions within the PURE system, these liposomes will fluoresce if they contain that set of genes required 
for the target biological subsystem's function (Fig. Ic). These fluorescent liposomes are isolated by FAGS (Fig. Id), 
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Figure 1 | Strategy used for integrated synthetic genetics, (a) Construction of an artificial cell-like compartment. The artificial cell contains ribosomes, 
RNA polymerases, fluorescent reporters and a randomly distributed ORF library, (b) Fluorescent reporter. The reporter fluoresces only when the target 
biological subsystem's function is active, (c) Construction of a liposome library. Reconstructing the function of the target subsystem occurs in a 
subpopulation of the liposome library (shown in blue) that contains that set of genes required for the target biological subsystem, resulting in 
fluorescence, (d) Sorting fluorescent liposomes by FACS. (e) Identifying incorporated genes by multiplex next-generation sequencing, (f) Identifying 
common factors by hierarchical cluster analysis. 



and the genes in each of the fluorescent liposomes are determined by 
multiplex next-generation sequencing^ (Fig. le). Because each lipo- 
some also contains many irrelevant genes, those genes that are com- 
mon in fluorescent liposomes are identified by hierarchical cluster 
analysis (Fig. If). Finally, these common factors are considered to be 
a set of genes that are required for a target biological subsystem. The 
importance of this methodology is that screening for genes involved 
in a subsystem and reconstructing the whole subsystem can be done 
simultaneously on the basis of the artificial cell-like compartments 
that have strictly defined contents and genetic -like screening that 
begins with a whole gene library. This methodology can be realized 
due to the large-scale information processing capacity of next-gen- 
eration sequencing. 

Results and Discussion 

p-Galactoside hydrolysis subsystem. To demonstrate the feasibility 
of our methodology, we selected the Escherichia coli 'P-galactoside 
hydrolysis subsystem' as our target. In E. coli, P-galactosidase 
encoded for by LacZ is necessary and sufficient for P-galactoside 
hydrolysis^. Thus, this target represents the simplest of model sys- 
tems. To detect this subsystem, we used 5-chloromethylfluorescein 
di-p-D-galactopyranoside (CMFDG) as the reporter. CMFDG is a 
non-fluorescent molecule that contains two galactose moieties and 
fluoresces when these galactose moieties are hydrolysed (Fig. 2a). 
Using a bulk assay, we verified p-galactoside hydrolysis activity in 
the PURE system solution. We mixed LacZ with a T7 promoter 
(T7P-LacZ), a PURE system solution and CMFDG and then incu- 
bated this mixture at 37°C. This resulted in intense fluorescence 



derived from CMFDG, which indicated a reliable level of activity 
with this PURE system (Supplementary Fig. 1). 

Construction of the £. coli ORF library. We constructed an E. coli 
ORF library comprising 4,123 genes with a T7 promoter, on the basis 
of the ASKA library that contains 4,132 E. coli strains harbouring an 
E. coli gene plasmid library. To simplify the construction of the E. coli 
ORF library, 4,132 E. coli strains were divided into groups with 
approximately 100 strains per group. Each group was cultured in 
Luria-Bertani medium, and plasmids from each group were 
purified. Gene fragments with a T7 promoter were amplified using 
PGR. Finally, equal amounts of the amplified gene fragments were 
mixed to prepare the E. coli ORF library. We used deep sequencing to 
check the quality of this library. Almost all genes (96.7%) were 
sequenced at least once, which assured the library's quality 
(Supplementary Fig. 2 and Supplementary Table 1). 

IVTT reaction in liposomes. We attempted ultrahigh-throughput 
reconstruction of the 'P-galactoside hydrolysis subsystem' by starting 
with the E. coli ORF library. First, we prepared a liposome library 
that contained the PURE system solution, 100 |aM CMFDG, 1 |iM 
transferrin- Alexa Fluor 647 conjugate (volume marker) and 5 nM E. 
coli ORF library. Microscopic inspection indicated that the average 
size and volume of these liposomes were 2.4 |im and 7.2 fL, respec- 
tively (Supplementary Fig. 3), which indicated that approximately 20 
genes were randomly incorporated in each liposome. Using the 
formula for combination with repetitions, the probability of having 
a given target gene among 20 genes randomly chosen from 4,123 
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Figure 2 | Ultrahigh-throughput reconstruction of the P-Galactoside hydrolysis subsystem, (a) Fluorescent reporter for P-galactoside hydrolysis 
activity. When LacZ that hydrolyzes (3-galactosides is expressed in a liposome, CMFDG is hydrolyzed and fluoresces, (b) FAGS analysis of the liposome 
library. When liposomes containing the PURE system solution, GMFDG, a transferrin -Alexa Fluor 647 conjugate and 5 nM E. coli ORF library were 
constructed, liposomes that emitted GMFDG-derived fluorescence (upper right quadrant) occurred at a rate of 0.26%. No fluorescence was observed for 
liposomes without the E. coli ORF library. 



genes was 0.48%. Subsequently, these liposomes were incubated at 
37°C to allow for gene expression and analysed by FAGS. By FAGS 
analysis, particles that emitted Alexa Fluor 647-derived fluorescence 
were classified as liposomes. This showed that GMFDG-derived 
fluorescence was not detected in liposomes devoid of the E. coli 
ORF library, whereas 0.26% of liposomes that contained the E. coli 
ORF library fluoresced (Fig. 2b). This indicated that fluorescent 
signals were derived from functioning genes incorporated in these 
liposomes. The theoretical and experimental proportions of fluore- 
scent liposomes (0.48% and 0.26%, respectively) were similar. These 
results indicated that genes were distributed in a random manner and 
that once distributed, the genes in these liposomes were correctly 
translated into functional proteins. 

Multiplex next-generation sequencing. To identify the genes 
required for the p-galactoside hydrolysis subsystem, fluorescent 
liposomes and control non-fluorescent liposomes were isolated. 



and the genes included in them were determined by multiplex next- 
generation sequencing (Supplementary Table 2). Each liposome 
contained numerous genes irrelevant to the P-galactoside hydro- 
lysis subsystem. Thus, we used hierarchical cluster analysis for the 
genes in the isolated liposomes to detect any specific patterns and to 
identify common genes. Hierarchical cluster analysis revealed no 
common factor(s) in our control analysis of non-fluorescent 
liposomes (Fig. 3a). In contrast, LacZ was a clear cluster that was 
included in all fluorescent liposomes and with no other common 
factors (Fig. 3b). This indicated successful ultrahigh-throughput 
reconstruction of the p-galactoside hydrolysis subsystem from the 
E. coli ORF library by integrated synthetic genetics. Furthermore, the 
conclusions drawn from our cluster analysis were confirmed by 
additional data that virtually all liposomes constructed using LacZ 
only were fluorescence positive (Fig. 4). Although we used the P- 
galactoside hydrolysis subsystem as a target in this study, similar 
ultrahigh-throughput reconstructions can be performed for any 



SCIENTIFIC REPORT! | 4 : 4722 | DOI: 1 0.1 038/srep04722 



3 




JW0335 (LacZ) 



Figure 3 | (a) Hierarchical cluster analysis for genes included in non-fluorescent liposomes. Ten non-fluorescent liposomes that did not show 
(3-galactoside hydrolysis activity were isolated, and the genes included in these liposomes were determined by multiplex next-generation sequencing using 
HiSeq 2500. In the hierarchical cluster analysis for these genes, there was no commonality among the genes included in each liposome, which indicated a 
random pattern. Each row represents a separate gene, each column represents a separate liposome and genes found in each liposome are shown in 
red. (b) Hierarchical cluster analysis for the genes included in fluorescent liposomes. Hierarchical cluster analysis revealed that LacZ was a clear cluster 
that was included in all fluorescent liposomes and with no other common factors. 



biological subsystem using an appropriate reporter. Integrated 
synthetic genetics can be applied to more complex systems such as 
cancer. One of the fundamental characteristics of cancer cells is 
unlimited cell proliferation which involves promoting cell survival 
and blocking apoptosis^^. Consistently, it is known that a few key 
hallmarks related to apoptosis, cytoskeleton and genomic instabflity 
are significantly enriched in tumor genomic alterations Recon- 
struction and modelling of cancer hallmarks -specific networks will 
provide insights into cancer therapies. 

In conclusion, in this study, we successfully constructed integrated 
synthetic genetics as a novel method to integrate reductive and syn- 
thetic approaches. This system combines the advantages of reductive 
and synthetic approaches and has three beneficial features (Supple- 
mentary Fig. 4). First, this method provides for simultaneous high- 
throughput implementation of screening and reconstruction, which 
have been previously used as fundamentally distinct methods in 
biological research. Second, if an appropriate reporter is available, 
this method provides for ultrahigh-throughput reconstruction of any 
biological subsystem of any species. Using a cDNA library, this sys- 
tem may even be applicable to non-model organisms for which a 
whole gene library is unavailable. Third, even when a target subsys- 
tem involves numerous unknown factors, screening for genes that 



are necessary and sufficient is feasible using multiplex sequencing. 
For example, to address a subsystem that involves five unknown 
factors within the context of all E. coli genes, conventional methods 
would require 4123^ — 10^^ combinations of experiments, which are 
practically impossible to implement. With our approach, assuming 
that 200 genes are randomly introduced into one liposome, the prob- 
ability that this liposome contains all five unknown genes would be 
10"^, which would correspond to a 10^^-fold increase in efficiency 
compared with a conventional approach and therefore, detection 
becomes sufficiently realistic. 

As demonstrated in this study using the P-galactoside hydrolysis 
subsystem, our method offers the advantage of faster identification, 
even for single gene identification, as compared with conventional 
methods. In practical terms, our method takes approximately one 
week to complete liposome construction, reactions, isolation, multi- 
plex sequencing and data analysis. Thus, our proposed method pro- 
vides a novel method for life science research and has the potential to 
substantially enhance research efficiency. 

Methods 

E. coli ORF library. The ASKA library (GFP non-fusion type)^ contained all 4,132 
genes of E. coli and was provided by the NBRP National Institute of Genetics 
(Mishima, Japan). All 4,132 genes were amplified by PGR using ASKA library 
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Figure 4 | p-Galactoside hydrolysis activity in artificial cell-like compartments. Multiplex sequencing identified LacZ as a common gene that was 
included in fluorescent liposomes. To verify this, liposomes that contained the PURE system solution, 100 [iM CMFDG, 1 [iM transferrin- Alexa Fluor 
647 conjugate (volume marker) and 5 nM LacZ were constructed and assayed for their reproducibility of (3-galactoside hydrolysis activity. As a negative 
control, liposomes without LacZ were also constructed. Liposomes were incubated at 37°C and analysed by FAGS. At 0 h, GMFDG-derived fluorescence 
(abscissa) was not detected in either group of liposomes, while after 3 h of incubation, intense fluorescence was detected only in liposomes that contained 
5 nM LacZ, which verified the reconstruction of p-galactoside hydrolysis activity. Almost all liposomes had shifted to the right and 8.9% of these 
liposomes were in the upper right quadrant, which indicated intense fluorescence. 



plasmids as previously described^", with some modifications. In brief, 4,132 E. coli 
strains obtained from the NBRP were divided into groups with approximately 100 
strains per group. Each group was cultured in 100 mL of Luria-Bertani medium + 
chloramphenicol (0.5% w/v yeast extract, 1% w/v tryptone, 1% w/v NaCl and 20 |ag/ 
mL of chloramphenicol). Next, plasmids were extracted from each group and gene 
fragments were amplified using the following common primers: ASKA forward 
primer, S'-GGCCrAArACGACrCACrATAGGAGAAATCATAAAAAATTTAT- 
TTGCTTTGTGAGCGG-3', and ASKA reverse primer, 5'-GTTATTGCTCAG- 
CGGr7AGCGGCCGCATAGGCC-3'. ASKA forward primers contained the T7 
promoter sequence {Italicized) for gene expression by the PURE system, and ASKA 
reverse primers contained the stop codon {Italicized). The amplified gene fragments 
in each group were purified, and equal amounts were mixed to prepare the E. coli ORE 
library with the added T7 promoter. The average length of all E. coli genes was 880 bp; 
this value was used to estimate the molarity of the E. coli ORE library. 

In vitro transcription and translation (IVTT) system. The IVTT system, PURE 
system, used in this study was purchased from GeneErontier Corporation (Chiba, 
Japan). The composition of the PURE system was previously described^'^\ In brief, 
the PURE system contained purified ribosomes, translation initiation factors, 
elongation factors, release factors, aminoacyl-tRNA synthetases, methionyl-tRNA 
transformylase, T7 RNA polymerase and a DnaK chaperone. In addition, this system 
contained tRNAs, NTPs, creatine phosphate, 10-formyl-5,6,7,8-tetrahydrofolic acid, 
20 amino acids, creatine kinase, myokinase, nucleoside-diphosphate kinase and 
phyrophosphatase. 

IVTT reaction under bulk conditions. The IVTT reaction solution was prepared by 
mixing the PURE system solution, DNA fragments amplified from the ASKA library 
and 100 |iM 5-chloromethylfluorescein di-^-D-galactopyranoside (CMFDG; Life 
Technologies, Carlsbad, CA, USA). CMEDG has two galactose moieties and is one of 
the most sensitive substrates for galactosidases. Hydrolysis of non-fluorescent 
CMEDG can be monitored by an increase in its fluorescence. The reaction solution 
was incubated at 37°C and fluorescent signals were monitored every 10 min at 
}^ex = 490 ±10 nm and X^m = 516 ± 10 nm using an Infinite Ml 000 fluorescence 
microplate reader (TECAN, Mannedorf, Switzerland). 

IVTT reaction in liposomes. Liposomes were constructed by the water-in-oil 
emulsion-transfer method as previously described^^"^*, with some modifications. In 



brief, 1 mL of liquid paraffin containing 250 |ig of l-palmitoyl-2-oleoyl-sn-glycero- 
3-phosphocholine (Avanti Polar Lipids, Alabaster, AL, USA) and 25 |ag of cholesterol 
(Nakalai Tesque, Kyoto, Japan) were mixed with the IVTT reaction solution using a 
syringe pump to prepare water-in-oil emulsion droplets^ The IVTT reaction solution 
contained 20 |iL of the PURE system solution supplemented with 5 nM E. coli ORE 
library, 200 mM sucrose, 0.5 U/uL of RNase inhibitor (RNasin plus, Promega, 
Madison, WI, USA), 100 \iM CMEDG and 1 \iM transferrin- Alexa Fluor 647 
conjugate (Life Technologies) as a volume marker. The water-in-oil emulsion 
droplets were mixed with a magnetic stirrer for 1 min and then equilibrated on ice for 
10 min to stabilize the emulsions. The mixture was gently placed in 150 [iL of the 
PURE system solution that contained 200 mM glucose in a microtube and was 
centrifuged at 15,000 X ^for 30 min. Prepared liposomes were removed through an 
opening at the bottom of the tube. It is important that the liposomes are dispersed in 
the PURE system solution to prolong protein production in these liposomes^^ The 
prepared liposomes were incubated at 37°C for protein production. 

FAGS analysis. A JSAN cell sorter (Bay Bioscience, Hyogo, Japan) and a EACSAria 
(Becton Dickinson, Franklin Lakes, NJ, USA) were used for liposome sorting and 
analysis. CMEDG-derived fluorescence and Alexa Fluor 647-derived fluorescence 
were monitored separately using a dual band pass filter. Among the particles detected 
by EACS analysis, those that emitted Alexa Fluor 647-derived fluorescence were 
classified as liposomes. CMEDG-derived fluorescent liposomes were classified as 
liposomes that contained the genes required for the P-galactoside hydrolysis 
subsystem and sorted accordingly. As a negative control, non-fluorescent liposomes 
that did not emit CMFDG-derived fluorescence were similarly isolated. 

Illumina sequencing. To evaluate the quality of the E. coli ORE library and determine 
the genes included in the isolated liposomes, multiplex next-generation sequencing 
was done using HiSeq 2500 (Illumina, San Diego, CA, USA). First, gene fragments in 
the isolated liposomes were amplified by 45 PCR cycles using the primers noted above 
(i.e. ASKA forward and reverse primers) and KOD-Plus- (Toyobo, Osaka, Japan). 
The concentrations of amplified gene fragments were determined using Quant-iT 
PicoGreen (Life Technologies). Next, the liposome-derived DNA fragments and the 
E. coli ORE library were pre-treated for HiSeq 2500 according to the Nextera XT DNA 
preparation kit protocol (Illumina). In brief, input DNA was fragmented by a 
transposome and the dual indexes were tagged by limited-cycle PCR, which allowed 
for discriminating between DNA fragments derived from different samples. Equal 
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amounts of DNA fragments tagged with dual indexes were mixed for 50-bp single- 
read sequencing on HiSeq 2500 in rapid-run mode, and approximately 68 million 
mapped reads were obtained. Next-generation sequencing was done by the Genome 
Network Analysis Support Facility of Riken. 

Data analysis. Mapping of the output data from HiSeq 2500 was done for E. coli ORF 
nucleotide sequences obtained from Genobase (http://ecoli.naist.jp/GB8-dev/index. 
jsp?page=genome_download.jsp) using Bowtie2^^. The abundance of ORFeome 
clones was quantified using R software. To verify the quality of the E. coli ORF library, 
the number of reads for each of the 4,123 genes was counted using sequence data 
(Supplementary Table 1). To identify the necessary and sufficient conditions for the 
(3-galactoside hydrolysis subsystem, the genes in the isolated liposomes were 
determined. To remove non-specific mapping, identified genes were listed using a 
threshold of 10,000 reads (Supplementary Table 2). Hierarchical cluster analysis 
using Cluster 3.0^^ was used to detect any specific patterns among the genes in 
liposomes. To organize clusters, complete linkage was used as the clustering method 
and Euclidean distances were used as similarity measures. JAVA Treeview^** was used 
to visualize the clustering results. 
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